Comparing several models for perceptual long-term modeling of amplitude and phase trajectories of sinusoidal speech

نویسندگان

  • Mohammad Firouzmand
  • Laurent Girin
  • Sylvain Marchand
چکیده

The so-called Long-Term (LT) modeling of sinusoidal parameters, proposed in previous papers, consists in modeling the entire time-trajectory of amplitude and phase parameters over large sections of voiced speech, differing from usual ShortTerm models, which are defined on a frame-by-frame basis. In the present paper, we focus on a specific novel contribution to this general framework: the comparison of four different LongTerm models, namely a polynomial model, a model based on discrete cosine functions, and combinations of discrete cosine with sine functions or polynomials. Their performances are compared in terms of synthesis signal quality, data compression and modeling accuracy, and the interest of the presented study for speech coding is shown.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Long term modeling of phase trajectories within the speech sinusoidal model framework

In this paper, the problem of modeling the trajectory of the phase of speech signal is addressed within the context of the sinusoidal model of speech. A global or long-term model of the trajectory of the phase of the partials is proposed for each entire voiced section of speech, contrary to standard models, which are defined on a frame-by-frame basis. The complete analysis-modeling-synthesis pr...

متن کامل

Spectral modification for concatenative speech synthesis

Concatenative synthesis can produce high-quality speech but is limited to the allophonic variations and voice types that were captured in the database. It would be desirable to modify speech units to remove formant discontinuities and to create new speaking styles, such as hypoor hyper-articulated speech. Unfortunately, manipulating the spectral structure often leads to degraded speech quality....

متن کامل

Perceptual audio modeling with exponentially damped sinusoids

This paper presents the derivation of a new perceptual model that represents speech and audio signals by a sum of exponentially damped sinusoids. Compared to a traditional sinusoidal model, the exponential sinusoidal model (ESM) is better suited to model transient segments that are readily found in audio signals. Total least squares (TLS) algorithms are applied for the automatic extraction of t...

متن کامل

Comparing the Contributions of Amplitude and Phase to Speech Intelligibility in a Vocoder-Based Speech Synthesis Model

Vocoder-based speech synthesis model has been long used to assess the contribution of acoustic cue for speech recognition. This study compared the perceptual contributions of amplitude and phase by using two types of stimuli, i.e., amplitudeand phase-based vocoded stimuli. The amplitude-based vocoded stimuli were synthesized by preserving amplitude fluctuation cue but discarding phase cue (i.e....

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005